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THE  PROBLEM 

To  identify  the  potential  relevance  of  the  signal  envelope  for  aural  classification. 

THE  FINDINGS 

Listeners  were  trained  to  classify  a  set  of  sounds  into  eight  categories.  Classification  was 
almost  as  good  in  subsequent  tasks  where  listeners  classified  signal  envelopes  or  signals 
created  by  modulating  a  tone  with  the  signal  envelopes.  Classification  of  signals  created 
by  modulating  a  tonal-complex  or  broadband  noise  was  markedly  worse,  probably  due  to 
interaction  of  sidebands  from  nearby  carrier  frequencies. 

APPLICATION 

These  results  indicate  that  further  investigation  of  envelope  features  and  aural  sensitivity 
to  these  features  would  further  our  understanding  of  aural  classification  of  brief  complex 
sounds. 
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ABSTRACT 


Listeners  were  trained  to  classify  a  set  of  sounds  into  eight 
categories.  Classification  was  almost  as  good  in  subsequent  tasks  where 
listeners  classified  signal  envelopes  or  signals  created  by  modulating  a  tone 
with  the  signal  envelopes.  Classification  of  signals  created  by  modulating  a 
tonal-complex  or  broadband  noise  was  markedly  worse,  probably  due  to 
interaction  of  sidebands  from  nearby  carrier  frequencies.  These  results 
suggest  the  importance  of  envelope  cues  for  aural  classification.  Further 
investigation  of  envelope  features  and  aural  sensitivity  to  these  features 
would  further  our  understanding  of  aural  classification  of  brief  complex 
sounds . 


Most  psychoacoustic  research  has  concentrated  on  describing  the  auditory 
system's  ability  to  detect  signals  or  to  discriminate  small  changes  in  simple 
stimuli  such  as  tones  or  bands  of  noise.  More  recently,  increasing  attention 
is  being  paid  to  the  auditory  system's  ability  to  classify  sounds  so  that  we 
can  better  understand  the  acoustic  features  that  underlie  aural  recognition  of 
complex  real-world  sounds.  Many  acoustic  features  are  potentially  available 
within  the  auditory  system,  but  an  analysis  of  some  of  these  features  can  be 
simplified  by  considering  one  aspect  of  the  acoustic  waveform  -  the  amplitude 
envelope.  The  signal  envelope  is  an  amplitude-time  function  that  describes 
the  signal's  amplitude  variation  distinct  from  the  spectral  content.  Thus, 
for  example,  a  two-tone  signal  consisting  of  500  and  504  Hz  would  produce  a 
"beating"  sensation  due  to  its  periodic  4  Hz  amplitude  variation,  as 
would  a  two-tone  signal  consisting  of  800  and  804  Hz  or  any  pair  of 
frequencies  separated  by  4  Hz. 

Using  two-tone  complexes,  Buus  (1983)  has  shown  that  the  auditory  system 
can  discriminate  envelope  fluctuations  up  to  at  least  640  Hz.  Using  a  noise 
carrier  modulated  by  the  speech  envelope.  Van  Tassell  et  al.  (1987)  have  shown 
that  envelope  frequencies  less  than  200  Hz  (and  possibly  higher)  provide  the 
information  needed  for  the  discrimination  of  certain  speech  features.  The 
present  study  examines  the  role  of  the  envelope  for  the  perception  of 
nonspeech  sounds. 


METHOD 

Signals.  Fifty  one-second  segments  were  extracted  from  digitized  recordings 
of  underwater  sounds.  Each  segment  contained  an  event  with  a  duration  ranging 
from  tens  to  hundreds  of  milliseconds.  Each  event  was  approximately  centered 
within  the  one-second  sample.  The  recordings  had  been  digitized  at  a  12.5  kHz 
sampling  rate  with  12  bits  of  linear  encoding  of  amplitude.  A  preliminary 
classification  of  the  fifty  events  into  eight  categories  was  performed  based 

on  transcripts  of  the  recording  sessions  and  in  consultation  with  two 

experienced  sonar  operators  who  listened  to  the  recordings  in  their  original 

context  (prior  to  extraction).  For  each  of  the  eight  categories,  three 

exemplars  were  chosen  that  represented  good-quality  samples  with  minimal 
ambiguity  regarding  the  accuracy  of  the  classification. 

Classification  performance  was  measured  using  the  original  set  of  fifty 
signals  and  four  other  sets  that  were  derived  from  the  signal  envelopes  of  the 
original  set.  Envelopes  were  extracted  from  the  digitized  signals  using 
the  Interactive  Laboratory  System  (ILS)  from  Signal  Technology  Inc.  In  the 
first  of  the  envelope  conditions,  the  fifty  envelopes  were  used  as  signals, 
i.e.,  the  envelope  was  presented  as  a  time  waveform;  in  the  second  condition, 
the  envelopes  were  used  to  modulate  a  multitonal  complex  consisting  of  tones 
with  approximately  one-third  octave  spacing;  in  the  third  condition,  the 
envelopes  were  used  to  modulate  a  broadband  noise  carrier;  and  in  the  fourth 
condition,  the  envelopes  were  used  to  modulate  a  3-kHz  tonal  carrier. 

Apparatus .  For  the  condition  with  the  original  set  of  signals  and  for  the 
first  of  the  envelope  conditions,  the  sounds  were  presented  over  16-bit 
digital-to-analog  converters  with  a  12.5  kHz  sampling  rate.  Stimuli  were 
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low-pass  filtered  at  5  kHz.  All  filters  in  this  experiment  had  asymptotic 
rejection  rates  of  115  dB/octave. 

In  the  modulated-carrier  conditions,  the  envelope  was  presented  over  16- 
bit  digital-to-analog  converters  and  low-pass  filtered  at  5  kHz  (3  kHz  for  the 
tonal  carrier).  The  carriers  for  the  three  conditions  were:  1)  a  tonal- 
complex,  consisting  of  500,  650,  800,  1000,  1250,  1600,  2000,  2500,  3150, 

4000,  and  5000  Hz  tones  (with  arbitrary  starting  phases),  which  was  presented 
over  a  second  digital-to-analog  converter,  2)  broadband  noise,  generated  from 
a  white-noise  generator  and  filtered  from  0.5  to  5.0  kHz,  and  3)  a  3-kHz  tone, 
generated  from  an  oscillator.  The  carrier  was  multiplied  by  the  envelope  and 
filtered  from  0.5  to  5.0  kHz  (0  to  6  )iHz  for  the  tone  carrier). 

In  all  conditions  a  prograunmable  attenuator  was  used  to  adjust  the 
amplitude  of  each  signal  to  a  comfortable  listening  level.  In  addition,  the 
programmable  attenuator  was  used  to  randomize  stimulus  levels  over  a  15-dB 
range  to  minimize  the  use  of  amplitude  as  a  classification  cue.  An  electronic 
switch  gated  the  stimuli  with  20-ms,  sine-squared  ramping.  Stimuli  were 
presented  to  the  right  earphone  of  a  Sennheiser  HD430  headset. 

Procedure. 

Table  I  shows  the  order  of  the  training  and  testing  phases. 

Table  I 

Order  of  training  and  testing. 


Initial  training 
Training  for  condition  1 
Testing  for  condition  1 
Training  for  condition  2 
Testing  for  condition  2 
Retesting  of  condition  1 
Training  for  condition  3 
Testing  for  condition  3 
Retesting  of  condition  1 
Training  for  condition  4 
Testing  for  condition  4 
Retesting  of  condition  1 
Training  for  condition  5 
Testing  for  condition  5 


(original  signals) 

(envelopes) 

(exemplars  only) 
(modulated  tone  complex) 

(exemplars  only) 
(modulated  noise) 

(exemplars  only) 
(modulated  tone) 


Initial  training:  On  each  trial,  one  stimulus  was  presented  and  the 
listener  classified  it  using  the  letters  A-H  to  designate  the  eight 
categories.  The  correct  response  was  displayed  after  each  response.  Only  the 
exemplars  from  categories  one  through  four  were  presented  within  half  of 
the  blocks  and  only  exemplars  from  categories  five  through  eight  were 
presented  within  the  other  blocks.  These  reduced  sets  were  used  to  facilitate 
the  learning  of  category  labels.  Listeners  had  at  least  670  trials  for  each 
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of  these  two  stimulus  sets  by  which  time  they  had  attained  stable  performance. 
Following  initial  training,  condition-specific  training  and  testing  occurred 
for  the  five  conditions  sequentially. 

Condition- specific  training:  For  each  condition,  listeners  were  trained 
with  the  full  set  of  eight  categories.  Only  the  three  exemplars  from  each 
category  were  presented  during  this  training,  and  correct-answer  feedback  was 
given  on  each  trial.  One  day  of  training,  consisting  of  at  least  720  trials, 
was  conducted  prior  to  switching  to  the  test  condition. 

Testing:  Each  of  the  fifty  stimuli  was  presented  once  within  a  block  of 

fifty  trials.  If  an  exemplar  was  presented,  correct-answer  feedback  was 
given.  However,  if  the  stimulus  was  one  of  the  twenty-six  stimuli  that  was 
not  an  exemplar  (such  stimuli  will  be  called  probe  stimuli),  no  feedback  was 
given.  Normally  thirty-six  blocks  of  fifty  trials  were  administered  during 
the  teat  phase.  These  blocks  were  run  over  two  days. 

All  listeners  were  tested  simultaneously  which  prevented 
counterbalancing  the  conditions  across  listeners.  Conditions  were  run  in  the 
following  order:  original  sonar  signals,  envelopes,  modulated  tone-complex, 
modulated  noise,  and  modulated  tone.  Before  each  of  the  modulated-carrier 
conditions,  listeners  were  retested  on  the  original  set  of  signals  for  one 
session  (approximately  864  trials)  to  monitor  performance  in  this  baseline 
condition.  Only  exemplars  were  used  for  retesting. 

Listeners .  Three  paid  volunteers  and  the  author  (identified  as  L2  in  the 
Tables)  served  as  listeners.  Each  had  normal  hearing  sensitivity  (less  than 
15  dB  HL  at  octave  frequencies  from  250  to  8000  Hz).  The  author  had  been 
involved  in  the  stimulus  preparation  and  was  very  familiar  with  the  signals 
prior  to  testing.  The  other  three  subjects  had  never  heard  these  sounds  prior 
to  this  experiment. 


RESULTS 

Table  II  shows  percent  correct  on  the  original  signals.  Performance  is 
stable  except  for  a  small  improvement  for  LI.  L4's  performance  is  markedly 
poorer  than  the  other  three  listeners  —  even  after  thousands  of  trials  with 
these  signals  L4  only  achieved  65%  correct  while  the  other  three  listeners  got 
more  than  90%  correct. 


Table  II 

Percent  correct  classification  on  initial  and  retested  conditions  using  the 
original  signals  for  each  of  four  listeners.  The  three  retestings  were  prior 
to  each  of  the  modulated  carrier  conditions. 

Listeners 


LI 

L2 

L3 

L4 

Initial  test 

89.4 

99.6 

88.4 

65.4 

First  retest 

95.6 

— 

91.1 

70.8 

Second  retest 

94.6 

— 

— 

65.4 

Third  retest 

96.6 

— 

— 

64.1 
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Table  III  shows  the  percent  correct  on  the  24  exemplars  and  the  percent 
correct  for  18  of  the  probe  stimuli  for  each  condition.  Eight  of  the  probe 
stimuli  were  eliminated  from  this  analysis  because  they  were  poor  recordings 
or  contained  artifacts  that  created  ambiguity  about  which  event  was  to  be 
judged.  Five  of  these  eight  stimuli  had  been  identified  in  a  previous  study 
(Hanna,  1989);  three  additional  stimuli  with  similar  problems  were  identified 
after  examining  listeners'  responses  to  the  original  (unaltered)  signals  of 
the  present  study.  The  inconsistency  of  listeners'  responses  to  these  stimuli 
was  clearly  attributable  to  the  quality  of  the  signal.  Apparently  three  of 
these  stimuli  were  not  identified  in  the  previous  study  due  to  those  two 
listeners'  prior  familiarity  with  the  signals. 

Table  III 

Percent  correct  classification  of  exemplar  and  probe  signals  in  each  of  the 
five  conditions  and  for  each  of  four  listeners. 

Exemplars : 

Listeners 


LI 

L2 

L3 

L4 

Original 

89.4 

99.6 

88.4 

65.4 

Envelope 

84.7 

— 

76.6 

51.1 

Modulated 

tone 

85.8 

91.9 

— 

27.0 

Modulated 

complex 

69.3 

— 

— 

20.0 

Modulated 

noise 

66.4 

57.7 

— 

— 

Probes; 

Listeners 

Ll 

L2 

L3 

L4 

Original 

69.4 

82.6 

67.9 

45.8 

Envelope 

65.4 

— 

57.7 

37.8 

Modulated 

tone 

60.1 

69.5 

— 

18.4 

Modulated 

complex 

41.5 

— 

— 

20.2 

Modulated 

noise 

40.7 

36.0 

— 

— 

Percents  correct  on  the  probe  stimuli  are  roughly  20%  less  than  on  the 
exemplars.  This  difference  is  presumably  a  reflection  of  the  fact  that  the 
clearest  and  least  ambiguous  signals  were  used  as  exemplars.  It  appears  that, 
for  each  condition,  listeners  appropriately  generalize  the  categories  to  the 
probe  stimuli  even  though  they  have  never  been  giver  feedback  about  the  probe 
stimuli.  This  result  indicates  that  listeners  are  learning  categories  rather 
than  simply  learning  arbitrary  assignments  of  category  labels  to  each  of  the 
exemplar  stimuli. 

With  two  minor  exceptions,  performance  on  the  five  conditions  is  rank- 
ordered  similarly  for  each  listener  and  for  both  exemplar  and  probe  stimuli. 
Classification  is  best  on  the  original  (unaltered)  set  of  signals,  followed  by 
the  envelope  condition,  the  modulated  tone  condition,  the  modulated  tonal- 
complex  condition,  and  the  modulated  broadband-noise  condition.  Listener  4  is 
noticeably  worse  than  the  other  three  listeners,  particularly  for  the 
modulated-carrier  conditions. 
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Performance  in  the  envelope  condition  was  almost  as  good  as  for  the 
original  signals.  Percent  correct  was  only  4-14%  worse  when  the  envelope  was 
presented  as  the  signal.  Moreover,  the  envelope  signals  sounded  remarkably 
similar  to  the  original  signals.  For  two  of  three  listeners  for  which  a 
comparison  is  possible,  performance  on  the  modulated-tone  condition  is  also 
almost  as  good  as  for  the  original  signals.  For  these  two  listeners  (LI  & 

L2),  percent  correct  was  only  4-13%  worse  with  the  modulated  tone.  The  other 
listener  (L4)  was  the  one  that  did  poorly  (near  chance)  in  all  of  the 
modulated-carrier  conditions.  All  listeners  showed  a  marked  decrement  with 
the  modulated  tonal-complex  and  the  modulated  noise  carriers.  Percent  correct 
was  20-46%  worse  in  these  two  conditions  than  for  the  original  signals. 
Although  performance  was  noticeably  worse  for  these  two  conditions  it  should 
still  be  noted  that  LI  &  L2  still  got  36-69%  percent  correct  with  these 
signals  (versus  chance  performance  of  12.5%). 

Table  IV  shows  the  pattern  of  errors  made  by  LI  for  whom  we  have  data 
for  all  conditions.  This  table  shows  the  percentage  of  times  each  response 
was  given  to  the  exemplar  signals  from  each  of  the  eight  categories.  For  the 
envelope  and  modulated-tone  conditions,  categories  3,  6,  and  8  are  the  ones 
that  show  the  largest  decrements  relative  to  the  original  signals.  Most  of 
the  increase  in  errors  is  attributable  to  these  three  categories.  These 
results  suggest  that  the  envelope  does  not  contain  all  of  the  information  by 
which  listeners  distinguished  categories  3,  6,  and  8.  For  the  modulated 
tonal-complex  and  modulated  noise  conditions,  decrements  are  found  for 
categories  1,  3,  4,  6,  7,  and  8.  The  errors  among  stimuli  from  categories  1, 

3,  6,  and  8  are  common.  These  errors  include  those  made  in  the  envelope  and 
modulated-tone  conditions  as  well  as  additional  errors  involving  categories  1, 

4,  and  7.  Thus,  the  conditions  do  not  idiosyncratically  degrade  the 
signals — the  conditions  may  be  rank-ordered  not  only  by  overall  level  of 
performance  but  also  by  the  types  and  amount  of  information  that  are  affected 
in  the  various  conditions.  The  information  lost  (or  masked)  in  the 
modulated-tonal-complex  and  modulated-noise  conditions  is  in  addition  to  that 
lost  (or  masked)  in  the  envelope  and  modulated-tone  conditions. 

We  analyzed  the  patterns  of  errors  using  Multidimensional  Scaling  (MDS)  in 
order  to  identify  perceptual  dimensions  underlying  the  classification  of  these 
sounds  (Kruskal  &  Wish,  1978).  MDS  is  a  scaling  technique  that  places  stimuli 
in  a  multidimensional  space  such  that  the  closeness  of  two  points  in  the  space 
is  correlated  with  the  perceived  similarity  of  the  two  stimuli.  For  our  data, 
the  similarity,  S(i,j),  of  stimuli  i  and  j  was  defined  as: 

8 

S(i,j)  =  2  p(i,k)p(j,k)  (1) 

k=l 

where  p(i,k)  is  the  probability  that  stimulus  i  will  be  called  category  k. 
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Table  IV 


Confusion  matrix  for  LI  for  each  of  the  five  conditions.  Each  entry 
represents  the  percentage  of  times  this  listener  gave  a  particular  response  to 
exemplar  signals  from  a  given  category. 


Original  signals; 

1  2 

Signal  cat. 

1  99  - 

2  1  64 

3-8 

4  -  - 

5  -  - 

6  -  - 

7  -  - 

8  -  - 


Response 

3  4  5  6  7 


31  1  -  -  3 

80  3  -  1  8 

-  96  -  -  4 

-  100 

-  4  1  92  1 

-  1  -  -  99 

1  3  -  10  - 


8 

1 

3 

86 


Envelope; 

1  2 

Signal  cat. 

1  94  - 

2-75 

3  2  19 

4  1- 

5  -  - 

6  5  - 

7-2 
8  1- 


Response 
3  4  5 

-  1  - 

20  -  - 

70  2  - 

-  98  - 

-  100 
1  -  - 

3  6- 


6 

4 

4 

5 

72 

19 


7 


1 

3 

1 


98 


8 

1 

1 

23 

72 


Modulated  tone; 

1  2 

Signal  cat. 

1  96  - 

2  1  87 

3  5  13 

4  -  - 

5  -  - 

6  2  2 

7  -  - 

8  3  15 


Response 

3  4  5  6  7 

2--2- 
7  1-13 

68  -  -  10  1 

-  100 

-  100 

7  -  -  76  4 

-  -  -  -  100 

16  -  -  7  - 


8 

3 

9 

59 


Modulated  tonal-complex: 


Response 


Signal  cat 

1 

2 

3 

4 

5 

6 

7 

8 

1 

67 

3 

18 

- 

- 

1 

1 

10 

2 

4 

75 

21 

- 

- 

- 

- 

- 

3 

13 

16 

44 

1 

- 

4 

2 

20 

4 

- 

- 

- 

92 

1 

- 

- 

7 

5 

- 

- 

- 

4 

95 

1 

- 

- 

6 

5 

3 

14 

3 

- 

44 

7 

24 

7 

1 

6 

13 

1 

1 

1 

77 

- 

8  1  -  3  21 

Modulated  broadband-noise: 

Response 

11 

2 

62 

Signal  cat 

1 

2 

3 

4 

5 

6 

7 

8 

1 

61 

6 

15 

- 

- 

11 

1 

6 

2 

7 

88 

1 

2 

- 

- 

2 

- 

3 

13 

10 

39 

13 

- 

14 

4 

6 

4 

3 

- 

1 

78 

2 

4 

2 

10 

5 

- 

- 

- 

1 

98 

- 

1 

- 

6 

10 

6 

18 

5 

- 

46 

4 

11 

7 

6 

9 

3 

- 

- 

1 

81 

- 

8 

10 

2 

4 

16 

- 

22 

4 

41 
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MDS  REPRESENTATION  FOR  24  EXEMPLARS 


MDS  DIMENSION 


CATEGORY 

A  I 
□  2 
O  3 
V  4 

O  5 

*  6 

E  7 

+  8 


TO 

2 

O 

CO 

2 

UJ 


Q 

CO 

Q 


CATEGORY 

A  I 
□  2 
O  3 
V  4 
O  5 
6 

Z  7 
+  8 


Figure  1.  SINDSCAL  representation  for  the  twenty-four  exemplars.  Fig.  la 
plots  values  on  Dimension  4  vs.  values  on  Dimension  1;  Fig.  lb  plots  values  on 
Dimension  3  vs.  values  on  Dimension  2.  The  different  symbols  represent  the 
eight  different  stimulus  categories.  Three  exemplars  from  each  category  are 
shown . 


This  summed  crossproduct  corresponds  to  the  probability  that  the  two  stimuli 
would  be  given  the  same  category  label,  assuming  independent  responses  to 
each.  Similarity  matrices  defined  by  Eg.  1  were  constructed  separately  for 
each  listener  and  condition.  A  SINDSCAL  MDS  analysis  (Carroll  &  Chang,  1970) 
of  these  fourteen  matrices  (corresponding  to  condition  by  listener  entries  in 
Table  III)  produces  a  common  perceptual  space  and  a  weightings  space  that 
indicates  differential  weighting  of  the  dimensions  for  the  fourteen  similarity 
matrices . 

A  four-dimensional  solution  accounted  for  61.3%  of  the  variance.  Higher¬ 
dimensional  solutions  provided  only  a  moderately  better  fit  (65.4%  and  67.3% 
for  five-  and  six-dimensional  solutions,  respectively) .  Decreasing  the 
dimensionality  to  three  caused  a  larger  change  in  explained  variance,  down  to 
52.2%.  Using  the  typical  criterion  for  dimensionality,  we  too)c  the  relatively 
sharp  decrement  in  explained  variance  as  dimensionality  decreased  from  four  to 
three  as  an  indication  that  four  dimensions  are  appropriate  to  describe  the 
data.  This  solution  uses  about  256  parameters  (4  dimension  values  for  50 
stimuli  plus  4  weights  for  14  matrices)  to  predict  5600  data  values  (8 
response  probabilities  for  50  stimuli  for  14  matrices).  Repeated  analyses 
with  different  starting  configurations  gave  similar  results. 

Figure  1  shows  the  obtained  SINDSCAL  representation.  Figure  la  plots 
values  on  MDS  dimensions  1  and  4;  Figure  lb  plots  MDS  dimensions  2  and  3. 

Each  point  represents  a  single  stimulus.  For  clarity  only  the  24  exemplars 
are  shown.  Different  symbols  are  used  for  each  of  the  eight  stimulus 
categories.  Each  symbol  appears  three  times  in  each  plot  to  represent  the 
three  exemplars  from  each  category. 


WEIGHTING  COEFFICIENTS  FOR  FOUR  LISTENERS 


MDS  DIMENSION  NO. 


LISTENERS 


A 

L  1 

□ 

L2 

O 

L3 

V 

L4 

Figure  2.  Dimensional  weighting  coefficients  for  the  classification  of  the 
original  signals.  The  different  symbols  represent  the  four  listeners. 
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Examination  of  how  stimuli  varied  along  the  individual  dimensions 
suggested  perceptual  correlates  for  each  of  the  four  dimensions  (the  first 
attribute  corresponds  to  a  positive  value  on  the  given  dimension):  Dimension 
1  -  heavy  vs.  light.  Dimension  2  -  rough  vs.  nonrough,  Dimension  3  - 
extended  vs.  compact  in  duration.  Dimension  4  -  soft  vs.  hard. 

Figure  2  shows  the  dimension  weights  for  individual  listeners  for  the 
original  signals.  LI,  L2,  and  L3  have  similar  weightings,  but  L4  gives  a  much 
lower  weight  to  Dimension  4.  L4's  lower  performance  can  be  attributed  to  an 
insensitivity  to  this  dimension.  Figure  3  shows  the  dimension  weights  for  LI 
for  each  of  the  five  conditions.  Again,  Dimension  4  shows  the  greatest 
differences,  with  greatly  reduced  weight  in  the  modulated  tone  condition  and 
almost  no  weight  in  the  modulated  tonal-complex  and  the  modulated-noise 
conditions.  In  addition.  Dimension  3  receives  less  weight  in  the  modulated 
tonal-complex  and  modulated-noise  conditions.  These  decreases  in  weights  can 
be  interpreted  as  a  reduction  or  masking  of  the  cues  underlying  these 
dimensions.  This  reduced  sensitivity  to  Dimensions  3  and  4  is  compensated  for 
by  a  relatively  greater  weight  given  Dimensions  1  and  2.  These  results  are 
consistent  with  the  conclusion  that  the  cues  lost  in  the  modulated-tonal- 
complex  and  modulated-noise  conditions  are  in  addition  to  those  lost  in  the 
envelope  and  modulated-tone  conditions. 


WEIGHTING  COEFFICIENTS  FOR  FIVE  CONDITIONS 


J _ I _ I _ L 

12  3  4 


CONDITIONS 


A 

ORIG 

□ 

ENVE 

O 

TONE 

V 

COMP 

o 

NOIS 

MDS  DIMENSION  NO. 


Figure  3.  Dimensional  weighting  coefficients  for  LI.  The  different  symbols 
represent  the  five  stimulus  conditions. 
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DISCUSSION 


Although  the  number  of  listeners  and  conditions  tested  in  this  current 
study  are  limited,  the  results  serve  to  demonstrate  that  envelope  features  may 
have  a  very  significant  role  for  aural  classification  of  brief  nonspeech 
sounds.  As  indicated  by  performance  in  the  envelope  and  modulated-tone 
conditions,  sufficient  information  exists  in  the  envelope  to  classify  these 
brief  nonspeech  sounds  very  well.  It  would  seem  that  the  very  low  modulation 
rates,  less  than  80  Hz,  are  not  critical  because  listeners  could  classify 
signals  in  the  envelope  condition  where  frequencies  less  than  80  Hz  are 
relatively  inaudible.  However,  low  modulation  rates  may  still  be  very 
important  after  including  auditory  processing  effects.  The  auditory  system 
may  be  insensitive  to  modulation  rates  of  800  and  820  Hz,  but  easily  hear  the 
20  Hz  intermodulation  which  exists  between  them.  Thus,  low  frequency 
modulation  that  is  not  present  as  a  discrete  component  may  be  carried  by  high 
frequency  modulation  rates. 

The  poorer  performance  in  the  modulated  tonal-complex  and  modulated 
broadband-noise  conditions  is  likely  due  to  interaction  of  sidebands  from 
nearby  carrier  frequencies.  For  low  modulation  rates,  perhaps  less  than  50 
Hz,  these  interactions  would  probably  not  interfere  much  with  the  coherent 
modulation  of  the  carrier  frequencies.  For  larger  modulation  rates,  the 
sidebands  will  be  distributed  widely,  producing  complex  sets  of 
intermodulations  at  modulation  rates  that  are  lower  than  the  coherent 
modulation.  The  effects  of  critical  band  filtering  and  limited  temporal 
resolution  would  provide  greater  emphasis  to  the  intermodulations  than  the 
coherent  modulation,  resulting  in  a  significantly  different  modulation  pattern 
than  in  the  original  signal.  These  interactions  would  also  reduce  the 
modulation  depth  of  the  envelope  and  make  its  lower-frequency  components  less 
discernable.  Whether  the  poorer  performance  with  these  broadband  carriers  is 
due  to  the  loss  of  the  higher  modulation  rates  or  simply  the  masking  of  the 
lower  modulation  rates  remains  to  be  determined. 

The  SINDSCAL  analysis  provided  four  dimensions  with  perceptual  correlates 
that  are  potentially  related  to  sound  source  properties,  such  as  hardness  or 
weight.  Gibson  (1979)  suggested  that  perceptual  information  is  organized 
according  to  its  specification  of  object  properties.  The  current  results  are 
consistent  with  Gibson's  theory.  Warren  and  Verbrugge  (1984)  and  Richards  and 
Ullman  (1988)  also  suggest  that  temporal  features  can  provide  information 
about  object  properties.  A  psychoacoustic  theory  of  classification  would 
specify  the  acoustic  features  by  which  object  properties  are  auditorily 
determined.  Results  with  the  modulated  carriers  are  first  steps  towards 
identifying  acoustic  correlates  in  that  the  presence  of  Dimension  3  and  4  cues 
are  significantly  reduced.  Apparently  the  sideband  interactions  with 
broadband  carriers  affected  the  cues  underlying  Dimension  3  and  4  but  not 
those  cues  underlying  Dimensions  1  &  2. 

In  summary,  four  dimensions  have  been  identified  for  aural  classification 
of  a  set  of  brief  underwater  signals.  Various  methods  of  presenting  envelope 
information  aurally  suggest  that  the  envelope  contains  important  features  for 
aural  classification  of  these  signals.  Listeners  were  insensitive  to  some  of 
these  features  under  certain  modulated-carrier  conditions,  but  an  acoustic 
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analysis  of  the  important  envelope  features  and  modulation  rates  will 
require  further  data  and  reference  to  auditory  models  of  modulation 
sensitivity. 
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